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At the LHC associated top quark and Higgs boson production with a Higgs decay to bottom quarks 
has long been a heavily disputed search channel. Recently, it has been found to not be viable. We 
show how it can be observed by tagging massive Higgs and top jets. For this purpose we construct 
boosted top and Higgs taggers for Standard Model processes in a complex QCD environment. 

PACS numbers: 



The main task of the LHC is to understand electroweak 
symmetry breaking, e.g. by confirming or modifying the 
minimal Higgs mechanism of the Standard Model [l], Q • 
In the Standard Model as well as its typical perturba- 
tive extensions, electroweak precision data clearly prefer 
a light Higgs boson [3j , most likely well below the thresh- 
old of Higgs decays to W bosons. If only because 68 % 
of light Higgs bosons (m# = 120 GeV) decay to bottom 
quarks we should look for this Higgs signature. 

Over the past years, Higgs search strategies based on 
different production mechanisms have been developed. 

The dominant gluon-fusion production process cannot 
be combined with a decay to bottom quarks, because of 
its overwhelming QCD background. For this production 
process all hopes rest on the Higgs decay to photons @ 
with its challenging signal-to-background ratio. 

Higgs production in weak boson fusion with a decay 
to bottoms challenges the Atlas and CMS triggers p}. 
Combined with a decay to taus instead, it is one of the 
discovery channels @] — provided analysis techniques like 
a central jet veto and collinear tt mass reconstruction 
work in the QCD environment of the LHC. 

While at the Tevatron the associated ZH and WH 
production serves as a discovery channel, at the LHC it 
is plagued by QCD backgrounds. Nevertheless, a recent 
study has shown that using a fat Higgs jet — i.e. a jet 
from a massive particle decay with subjet structure - 
we can extract WH/ZH production with H — > bb for a 
Higgs mass of 120 GeV at the ~ 4<r level using 30 fb _1 
of data 

Additional search channels like weak-boson- fusion pro- 
duction of jH [ll| or WH [12J final states combined with 
a decay H — > bb might be visible, but lack a final exper- 
imental word. It is clear, though, that none of them will 
lead to a discovery in the first years of LHC running. 

Last but not least, the associated production of a top 
quark with a Higgs boson at the LHC has a long history, 
usually in combination with a Higgs decay to bottoms. 
At some point it was expected to be the leading discovery 
channel for a light Higgs boson [13j, but recently it has 
been removed from the Higgs discovery plots by Atlas 
and CMS [T3|. Without systematic uncertainties, Atlas 



quotes a significance of 1.8 to 2.2<t for 30 fb _1 . Due to 
a (too) low signal-to-background ratio S/B ~ 1/9 this 
channel might not reach a 5cr significance for any lumi- 
nosity. The main problems are the combinatorial back- 
ground of bottom jets and the lack of a truly distinctive 
kinematic feature of the Higgs decay jets. 

Any meaningful analysis of the Higgs sector has to 
test the Yukawa nature of the Higgs-fermion couplings. 
In addition to the bottom Yukawa coupling discussed 
above we expect to extract the top Yukawa coupling from 
one-loop contribution to the higher-dimensional ggH and 
77-ff couplings. However, any kind of new heavy particle 
will also contribute to both of them, which makes it hard 
to perform a model independent top coupling measure- 
ment. A measurement based on a direct (i.e. tree level) 
production process is the only way to reliably measure 
the top Yukawa. All of these arguments point to 
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as a prime ingredient for understanding the Higgs sector 
at the LHC 0. 

In this paper we show how, using fat jets, this Stan- 
dard Model search channel can indeed be extracted with 
reasonable statistical significance and most importantly 
a much reduced sensitivity on systematics. The combina- 
torial problem in the signal we solve by the construction 
of two fat jets; based on those we find plenty of kinematic 
distributions which separate signal and background. 

Fat jets have been studied in the framework of searches 
for strongly interacting W bosons [lq , supersymmet- 
ric particles [lfij], heavy resonances decaying to strongly 
boosted top quarks fl7l . as well as the WH/ZH search 
mentioned above fol, 1 1 Ol] - For leptonic top quarks they are 
similar to complex mass and momentum reconstruction 
tools [3. Top taggers [HSU have been studied in high- 
Pt contexts, but differ in their applicability once the top 
quarks are only slightly boosted, E/m t > 1. Therefore, 
we construct Standard-Model Higgs and top taggers for 
tagging in busy environments at moderately high px and 
show how fat Higgs as well as top jets can be used to 
identify a Standard Model Higgs signature. 
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Signal and backgrounds — We consider associated top 
and Higgs production with one hadronic and one leptonic 
top decay. The latter allows the events to pass the Atlas 
and CMS triggers. The main backgrounds are 

pp — > ttbb irreducible QCD background 

pp — > ttZ irreducible Z-peak background 

pp — > tt + jets include fake bottoms (2) 

To account for higher-order effects we normalize our to- 
tal signal rate to the next-to-leading order prediction of 
702 fb for m H = 120 GeV [Hj]. The ttbb continuum back- 
ground we normalize to 2.6 pb after the acceptance cuts 
\y b \ < 2.5, p T ,b > 20 GeV and R bb > 0.8 of Ref. 0. This 
conservative rate estimate for very hard events implies a 
K factor of ctnlo/olo = 2.3 which we need to attach 
to our leading-order background simulation — compared 
to K = 1.57 for the signal. Finally, the tiZ background 
at NLO is normalized to 1.1 pb [23|]. For tt plus jets 
production we do not apply a higher-order correction be- 
cause the background rejection cuts drives it into kine- 
matic configuration in which a constant K factor cannot 
be used. Throughout this analysis we use an on-shell top 
mass of 172.3 GeV. All hard processes we generate u sing 
MadEvent (24|, shower and hadronize via Herwig++ l25ll 
(without g — ► bb splitting) and analyze with Fast Jet [261 ] . 
We have verified that we obtain consistent results for sig- 
nal and background using Alpgen [27j and Herwig 6.5 (2|| 

An additional background is VF+jets production. The 
Wjj rate starts from roughly 15 nb with pxj > 20 GeV. 
Asking for two very hard jets, mimicking the boosted 
Higgs and top jets, and a leptonic W decay reduces this 
rate by roughly three orders of magnitude. Our top 
tagger described below gives a mis-tagging probability 
around 5% including underlying event, the Higgs mass 
window another reduction by a factor 1/10, i.e. the final 
Wjj rate without flavor tags ranges around 100 fb. 

Adding two bottom tags we expect a purely fake- 
bottom contribution around 0.01 fb. To test the gen- 
eral reliability of bottom tags in QCD background re- 
jection we also simulate the Wjj background including 
bottom quarks from the parton shower and find a re- 
maining background of O(0.1 fb), well below 10% of the 
tf+jets background already for two bottom tags. For 
three bottom tags it is essentially zero, so we neglect it 
in the following. 

The charm-flavored Wcj rate starts off with 1/6 of 
the purely mis-tagged Wjj rate. A tenfold mis-tagging 
probability still leaves this background well below the 
effect of bottoms from the parton shower. Finally, a 
lower limit m r b c h c > 110 GeV keeps us safely away from 
CKM-suppressed W — > be decays where the charm is 
mis-identified as a bottom jet. 

Search strategy — The motivation for a tiH search 
with boosted heavy states can be seen in Fig. [1] the 
leading top quark and the Higgs boson both carry size- 
able transverse momentum. We therefore first cluster 
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FIG. 1: Normalized top and Higgs transverse momentum 
spectra in tiH production (solid). We also show pt,h in 
W~H production (dashed) and the pr of the harder jet in 
W~jj production with pr,j > 20 GeV (dotted). 



the event with the Cambridge/ Aachen (C/A) jet algo- 
rithm [29l | using R = 1.5 and require two or more hard 
jets and a lepton satisfying: 



Pt.j > 200 GeV 
p T ,i > 15 GeV 



(H)\ 



< 2.5 

< 2.5 



#l<4 



(3) 



f TT\ 

The maximum Higgs jet rapidity yj is limited by the 
requirement that it be possible to tag its 6-content. For 
lepton identification and isolation we assume an 80% ef- 
ficiency, in agreement with what we expect from a fast 
Atlas detector simulation. The outline of our analysis is 
then as follows (cross sections at various stages are sum- 
marized in Tab. [I]): 

(1) one of the two jets should pass the top tagger (de- 
scribed below). If two jets pass we choose the one whose 
top candidate is closer to the top mass. 

(2) the Higgs tagger (also described below) runs over all 
remaining jets with \y\ < 2.5. It includes a double bottom 
tag. 

(2') a third b tag can be applied in a separate jet analysis 
after removing the constituents associated with the top 
and Higgs. 

(3) to compute the statistical significance we require 
m b c b c = m H ± 10 GeV. 

In this analysis, QCD tt plus jets production can fake 
the signal assuming three distinct topologies: first, the 
Higgs candidate jet can arise from two mis-tagged QCD 
jets. The total rate without flavored jets exceeds ttbb 
production by a factor of 200. This ratio can be balanced 
by the two b tags inside the Higgs resonance. Secondly, 
there is an O(10%) probability for the bottom from the 
leptonic top decay to leak into the Higgs jet and combine 
with a QCD jet, to fake a Higgs candidate. This topology 
is the most dangerous and can be essentially removed by 
a third b tag outside the Higgs and top substructures. 
Finally, the bottom from the hadronic top can also leak 
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TABLE I: Number of events or m|j| c histogram entries per 
1 fb _1 including underlying event, assuming tuh = 120 GeV. 
The third row gives the number of events with at least one 
subjet pairing in the Higgs mass window while the fourth 
row (and below) gives the number of entries according to our 
algorithm based on the three leading modified Jade distances. 
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into the Higgs jet after being replaced by a QCD jet with 
the appropriate kinematics in the top reconstruction. 



FIG. 2: Individually normalized m^ c and m\ ec distributions 
for signal and background (with underlying event). 



These three distinct topologies appear in the tt back- 
ground because of the unusually large QCD jet activity 
which we corresponds to the huge QCD correction to the 
total rate. The impact of these background configura- 
tions on our analysis critically depends on the detailed 
simulation of QCD jet radiation in tt events. We there- 
fore perform our entire analysis for the minimal two b 
tags as well as for a safe scenario with three b tags, to 
achieve a maximal reduction of this background. 

Top and Higgs taggers — In contrast to other Higgs 
physics [9j or new physics [HI, [l(| applications our Higgs 
and top taggers cannot rely on a clean QCD environ- 
ment: on the one hand their initial cone size has to be 
large enough to accommodate only mildly boosted top 
and Higgs states, so additional QCD jets will contam- 
inate our fat jets [3(J. On the other hand, the small 
number of signal events does not allow any sharp rejec- 
tion cuts for dirty QCD events. Therefore, the taggers 
need to be built to survive busy LHC events. 

Our starting point is the C/A jet algorithm with 
R = 1.5. For a top candidate, which typically has a 
jet mass above 200 GeV, we assume that there could be 
a complex hard substructure inside the fat jet. To reduce 
this fat jet to the relevant substructures we apply the fol- 
lowing recursive procedure. The last clustering of the jet 
j is undone, giving two subjets ji,j2, ordered such that 
rrij 1 > nij 2 . If nij 1 > 0.8 rrij (i.e. j% comes from the un- 
derlying event or soft QCD emission) we discard j% and 
keep ji, otherwise both j\ and ji are kept; for each sub- 
jet ji that is kept, we either add it to the list of relevant 
substructures (if m,j i < 30 GeV) or further decompose it 
recursively. 

In the resulting set of relevant substructures, we ex- 
amine all two-sub jet configurations to see if they could 
correspond to a IF boson: after filtering as in Rcf. 9] 
to reduce contamination from the underlying event, the 
mass of the substructure pair should be in the range 
= 65 - 95 GeV (shown in Fig. C2). To tag the top 



quark, we then add a third subjet and, again after filter- 
ing @, require m\ cc = 150 - 200 GeV. We additionally 
require that the W helicity angle 9 with respect to the 
top candidate satisfies cos 6* < 0.7, as in Ref. 19]. For 
more than one top tag in the event we choose the one 
with the smaller \m\ cc — m^ o[c \ + |to^ c — m^ le |. The 
resulting top tagging efficiency in the signal, including 
underlying event, is 43%, with a 5% mis-tagging proba- 
bility in VF+jets events. Note that these values hold for 
only slightly boosted tops and in a particularly complex 
QCD environment. 

In contrast to the top tagger which identifies a top 
quark using its known mass and properties, our Higgs 
tagger has to search for a Higgs peak in the re- 
constructed to^ c without any knowledge of the Higgs 
mass. We use the same decomposition procedure de- 
scribed above (but now with a mass cutoff at 40 GeV and 
a mass drop threshold of 0.9). We then order all possible 
pairs of subjets by the modified Jade distance [lj| 



J = p T ,iPT,2 (A-Ri 2 ) 4 



(4) 



similar to the mass of the hard splitting, but shifted to- 
wards larger jet separation. The three leading pairings 
we filter and keep for the Higgs mass reconstruction. For 
these events we explicitly confirm that indeed we are 
dominated by pt,h ^ 200 GeV. 

Double vs triple bottom tag — At this stage we have 
not yet included any flavor tags to control the tt+jets 
and W+jets backgrounds. To reduce the leading tijj 
topology we first require two bottom tags for the sub- 
structure pairings reconstructing the Higgs. Based on 
the detector-level study we assume a 70% efficiency 
with a 1% mis-tagging probability for b tags of filtered 
Higgs subjets. 

We then apply a ±10 GeV mass window, after check- 
ing that the tails of the signal distribution drop sharply 
in particular towards larger mass values. In the double 
&-tag analysis we find for an integrated luminosity of 
100 fb" 1 : ' 
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S B S/B 


s/Vb 


m H = 115 GeV 


120 380 1/3.2 


6.2 


120 GeV 


100 380 1/3.8 


5.1 


130 GeV 


51 330 1/6.5 


2.8 



This result shows that we can extract the tiH signal 
with high significance. On the other hand, similar to the 
original Atlas and CMS analyses it suffers from low S/B, 
the impact of the poorly understood ii+jets background 
with its different kinematic topologies, its large theory 
uncertainty and potentially large next-to-leading order 
corrections, and the missing underlying event. 

To improve the signal-to-background ratio S/B and 
remove the impact of the ti+jets background (at the ex- 
pense of the final significance) we can apply a third b 
tag. Targeting the second rt+jets topology we remove 
the Higgs and top constituents from the event and cluster 
the remaining particles into jets using the C/A algorithm 
with R = 0.6, considering all jets with pr > 30 GeV. 
Amongst these jets we require one b tag with r\ < 2.5 
and a distance ARb,j > 0.4 to the Higgs and top sub- 
jets, assuming 60% efficiency and 2% purity. The last 
row of Table [I] confirms that requiring three bottom tags 
leaves the continuum tibb production as the only relevant 
background. 

In Fig. [3] we show the signal from the three leading (by 
modified Jade distance) m£| c entries of double-6-tagged 
combinations; our Higgs tagger returns a sharp mass 
peak. The bigger tail towards small m^ c we can reduce 
by only including the two leading jet combinations. 
This does not change the significance but sculpts the 
background more. Assuming that at this stage we 
will know the Higgs mass, we estimate the background 
from a clean right and a reasonably clean left side bin 
combined with a next-to-leading order prediction. The 
result of the triple 6-tag analysis is then (again assuming 
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The numbers in parentheses are without underlying 
event. While removing the highly uncertain tf+jets back- 
ground has indeed lowered the final significance, the 
background of the three 6-tag analysis is completely dom- 
inated by the well-behaved tibb continuum production. 

Further improvements — One of the problems in this 
analysis is that higher-order QCD effects harm its reach. 
Turning this argument around, we can use the additional 
QCD activity in the signal and continuum tibb back- 
ground to improve our search. Before starting with the 
fat-jet analysis we can for example analyze the four lead- 
ing jets with a radius R = 0.6 and px < 40 GeV and 
require a set of jet-jet and jet-lepton separation crite- 
ria we reject any event for which one of the three 
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FIG. 3: Reconstructed bottom-pair mass m r b l c for signal 
(vtih = 120 GeV) and backgrounds without (upper) and in- 
cluding (lower) underlying event. The distributions shown 
include three b tags. 

conditions holds 

cos9* 2n < -0.4 and Akr ja t e [70, 160] GeV 
cos0* a£ > 0.4 and &Rj 2 j 3 > 2.5 
ARj£ > 3.5 for any of the four leading jets. (5) 

dp 1 P 2 is the angle between p\ in the center-of-mass frame 
of Pi +P 2 and the center of mass direction (pi +P2) in the 
lab frame. It is not symmetric in its arguments; if the two 
particles are back to back and \pi\ > \p2\ it approaches 
cos 9* = 1, whereas for \pi\ < \p2\ it becomes —1 [32l |. 
The kx distance between two particles is (AkTje) 2 = 
min(p^ J -,p^ £ )Ai?| £ . At this stage and with our limited 
means of detector simulation this QCD pre-selection at 
least shows that there are handles to further improve 
S/B from 1/2.4 to roughly 1/2 (for m H = 120 GeV) 
with hardly any change to the final significance. 

In addition, we can envisage improving the analysis in 
several ways in the context of a full experimental study, 
including data to help constrain the simulations: 

(1) Replace the m^ c side bins by a likelihood analysis of 
the well-defined alternative of either tiH signal or tibb 
continuum background after three b tags. This increases 
the final number of events, our most severe limitation. 

(2) Provided the events can be triggered/tagged, include 
two hadronic or two leptonic top decays. This more than 
triples the available rate and includes a combinatorical 
advantage of requiring one of two tops to be boosted. 

(3) Without cutting on missing energy as part of the 
acceptance cuts use its measurement within errors to as- 
sign the correct jet to the leptonic top and become less 
dependent on the third b tag. 

Outlook — In this paper we have presented a new 
strategy to extract the Higgs production process tiH with 
the decay H — > bb at the LHC. After long debates this 
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signature has recently been abandoned by both LHC ex- 
periments, even though it would be an especially use- 
ful ingredient to a complete Higgs sector analysis at the 
LHC [5|. We propose two analysis strategies based on a 
boosted Higgs boson Q and a boosted top quark; one 
with a double and one with a triple b tag. The lat- 
ter compensates its reduced statistical significance with 
a strongly reduced dependence on systematic uncertain- 
ties. The only remaining background after three b tags 
is continuum tibb production with accessible side bins. 

For an integrated luminosity of 100 fb _1 and a Higgs 
mass of 120 GeV our three 6-tag analysis gives a sta- 
tistical significance of at least 4.5a and a signal-to- 
background ratio of at least S/B = 1/2.4. The signal- 
to-background ratio can be further improved using the 
structure of the QCD radiation for signal and back- 



ground. Combinatorial backgrounds are not a problem, 
and we find a multitude of distributions distinguishing 
between signal and continuum background. 
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